Search results

1 – 2 of 2
Article
Publication date: 1 August 2016

Bao-Rong Chang, Hsiu-Fen Tsai, Yun-Che Tsai, Chin-Fu Kuo and Chi-Chung Chen

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big…

Abstract

Purpose

The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment.

Design/methodology/approach

First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve.

Findings

As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode.

Research limitations/implications

Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run.

Practical implications

The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time.

Originality/value

When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.

Article
Publication date: 1 August 2016

Chin-Fu Kuo, Yung-Feng Lu and Bao-Rong Chang

The purpose of this paper is to investigate the scheduling problem of real-time jobs executing on a DVS processor. The jobs must complete their executions by their deadlines and…

Abstract

Purpose

The purpose of this paper is to investigate the scheduling problem of real-time jobs executing on a DVS processor. The jobs must complete their executions by their deadlines and the energy consumption also must be minimized.

Design/methodology/approach

The two-phase energy-efficient scheduling algorithm is proposed to solve the scheduling problem for real-time jobs. In the off-line phase, the maximum instantaneous total density and instantaneous total density (ITD) are proposed to derive the speed of the processor for each time instance. The derived speeds are saved for run time. In the on-line phase, the authors set the processor speed according to the derived speeds and set a timer to expire at the corresponding end time instance of the used speed.

Findings

When the DVS processor executes a job at a proper speed, the energy consumption of the system can be minimized.

Research limitations/implications

This paper does not consider jobs with precedence constraints. It can be explored in the further work.

Practical implications

The experimental results of the proposed schemes are presented to show the effectiveness.

Originality/value

The experimental results show that the proposed scheduling algorithm, ITD, can achieve energy saving and make the processor fully utilized.

Details

Engineering Computations, vol. 33 no. 6
Type: Research Article
ISSN: 0264-4401

Keywords

1 – 2 of 2